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Abstract 

The hypervolume indicator is an increasingly popular set measure to 
compare the quality of two Pareto sets. The basic ingredient of most 
hypervolume indicator based optimization algorithms is the calculation 
of the hypervolume contribution of single solutions regarding a Pareto 
set. We show that exact calculation of the hypervolume contribution is 
#P-hard while its approximation is NP-hard. The same holds for the 
calculation of the minimal contribution. We also prove that it is NP-hard 
to decide whether a solution has the least hypervolume contribution. Even 
deciding whether the contribution of a solution is at most + times the 
minimal contribution is NP-hard. This implies that it is neither possible 
to efficiently find the least contributing solution (unless P = NP) nor to 
approximate it (unless NP = BPP). 

Nevertheless, in the second part of the paper we present a very fast 
approximation algorithm for this problem. We prove that for arbitrarily 
given £,(5 > it calculates a solution with contribution at most (1 + s) 
times the minimal contribution with probability at least (1 — ^). Though it 
cannot run in polynomial time for all instances, it performs extremely fast 
on various benchmark datasets. The algorithm solves very large problem 
instances which are intractable for exact algorithms (e.g., 10000 solutions 
in 100 dimensions) within a few seconds. 

1 Introduction 

Multi-objective optimization deals with the task of optimizing several objective 
functions at the same time. As these functions are often conflicting, we cannot 
aim for a single optimal solution but for a set of Pareto optimal solutions. 
Unfortunately, the Pareto set frequently grows exponentially in the problem 
size. In this case, it is not possible to compute the whole front eflftciently and 
the goal is to compute a good approximation of the Pareto front. 
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There are many indicators to measure the quahty of a Pareto set, but there 
is only one widely used that is strictly Pareto compliant [23] , namely the hyper- 
volume indicator. Strictly Pareto compliant means that given two Pareto sets 
A and B the indicator values A higher than B if the Pareto set A dominates 
the Pareto set B. The hypervolume (HYP) measures the volume of the dom- 
inated portion of the objective space. It was first proposed and employed for 
multi-objective optimization by Zitzler and Thiele [2T] . 

It has become very popular recently and several algorithms have been de- 
veloped to calculate it. The first one was the Hypervolume by Slicing Objec- 
tives (HSO) algorithm which was suggested independently by Zitzler [20 and 
Knowles To improve its runtime on practical instances, various speed up 
heuristics of HSO have been suggested [171 HI]- The currently best asymp- 
totic runtime of 0(n log n + n^/^) is obtained by Beume and Rudolph |2] by an 
adaption of Overmars and Yap[ s algorithm [12] for Klee 's Measure Problem [8l. 



There are also various algorithms for small dimensions [3 [TT] . 

From a geometric perspective, the hypervolume indicator is just measuring 
the volume of the union of a certain kind of boxes in M^q, namely of boxes which 
share the reference point ^ as a common point. We will use the terms point and 
box interchangeably for solutions as the dominated volume of a point defines a 
box and vice versa. Given a set M of n points in R*^, we define the hypervolume 
of M to be 

HYP(M) := VOL I U [0, xi] X ... X [0, Xd] j 

\{xi,...,xd)eM / 

In [4] the authors have proven that it is #P-hard ^ in the number of dimension 
to calculate HYP precisely. Therefore, all hypervolume algorithms must have 
an exponential runtime in the number of objectives (unless P = NP). Without 
the widely accepted assumption P 7^ NP, the only known lower bound for any d 
is Q{n\ogn) [J. Note that the worst-case combinatorial complexity (i.e., the 
number of faces of all dimensions on the boundary of the union) of 0(n^) does 
not imply any bounds on the computational complexity. 

Though the #P-hardness of HYP dashes the hope for an exact subexponen- 
tial algorithm, there are a few estimation algorithms [TJ |4] for approximating the 
hypervolume based on Monte Carlo sampling. However, the only approximation 
algorithm with proven bounds is presented in [4 . There, the authors describe 
an FPRAS for HYP which gives an £- approximation of the hypervolume with 
probability (1 — S) in time 0{\og{l/S) nd/e^). 



New complexity results 

We will now describe a few problems related to the calculation of the hyper- 
volume indicator and state our results. For this, observe that calculating the 

■"^ Without loss of generality we assume the reference point to be 0^. 

is the analog of NP for counting problems. For details see either the original paper 
by Valiant ^| or the standard textbook on computational complexity by Papadimitriou |13j . 
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hypervolume itself is actually not necessary in a hypervolume-based evolution- 
ary multi-objective optimizer as the algorithm actually only has to find a box 
with the minimal contribution to the hypervolume. 

The contribution of a box x G M to the hypervolume of a set M of boxes 
is the volume dominated by x and no other element of M. We define the 
contribution CON(M, x) of x to be 

CON(M, x) := HYP(M) - HYP(M \ x). 

In Section [2] we show that this problem is #P-hard to solve exactly. Further- 
more, approximating CON by a factor of 2^ is NP-hard for any e > 0. 
Hence, CON is not approximable. Note that this is no contradiction to the 
above-mentioned FPRAS for HYP as an approximation of HYP yields no ap- 
proximation of CON. 

As a hypervolume-based optimizer is only interested in the box with the 
minimal contribution, we also consider the following problem. Given a set M of 
n boxes in R^, find the least contribution of any box in M, that is, 

MINCON(M) := minCON(M,x). 

xeM 

The reduction in Section |2] shows that MINCON is #P-hard and not approx- 
imable, even if we know the box which is the least contributor. 

Both mentioned problems can be used to find the box contributing the least 
hypervolume, but their hardness does not imply hardness of the problem itself, 
which we are trying to solve, namely calculating which box has the least contri- 
bution. Therefore we also examine the following problem. Given a set M of n 
boxes in R'^, we want to find a box with the least contribution in M, that is, 

LC(M) := argminCON(M,x). 

xeM 

If there are multiple boxes with the same (minimal) contribution, we are, of 
course, satisfied with any of them. In Section |2] we prove that this problem is 
NP-hard to decide. 

However, for practical purposes it most often suffices to solve a relaxed ver- 
sion of the above problem. That is, we just need to find a box which contributes 
not much more than the minimal contribution, meaning that is is only a (1 +s) 
factor away. If we then throw out such a box, we have an error of at most e. We 
will call this £:-LC(M) as it is an "approximation" of the problem LC. Given a 
set M of n boxes in and £ > 0, we want to find a box with contribution at 
most (1 + e) times the minimal contribution of any box in M, that is, 

CON(M,£:-LC(M)) < (l + e) MINCON(M). 

The final result of Section [2] is the NP-hardness of e-LC. This shows, that 
there is no way of computing the least contributor efficiently, and even no way 
to approximate it. 
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New approximation algorithm 

In Section |3] we will give a "practical" algorithm for determining a small contrib- 
utor. Technically speaking, it solves the following problem we call e-S-LC{M): 
Given a set M of n boxes in M^, e > and ^ > 0, with probability at least 1 — S 
find a box with contribution at most (1 + e) MINCON(M). 

Pr[CON(M, e-S-LC{M)) <(! + £) MINCON(M)] > 1 - ^. 

As we will be able to choose S arbitrarily, solving this problem is of high prac- 
tical interest. By the NP-hardness of e-LC there is no way of solving e-S-LC 
efficiently, unless NP = BPP. This means, our algorithm cannot run in poly- 
nomial time for all instances. Its runtime depends on some hardness measure H 



(cf. Section 3.2), which is an intrinsic property of the given input, but generally 
unbounded, i.e., not bounded by some function in n and d. 

However, in Section [4] we show that our algorithm is practically very fast 
on various benchmark datasets, even for dimensions completely intractable for 
exact algorithms like d = 100 for which we can solve instances with n = 10000 
points within seconds. This implies a huge shift in the practical usability of the 
hyper volume indicator. 



2 Hardness of approximation 

In this section we first show hardness of approximating MINCON, which we will 
use afterwards to show hardness of LC and e-LC. We will reduce #MON-CNF 
to MINCON, which is the problem of counting the number of satisfying assign- 
ments of a Boolean formula in conjunctive normal form in which all variables 
are unnegated. While the problem of deciding satisfiability of such formula is 
trivial, counting the number of satisfying assignments is #P-hard and even ap- 
proximating it by a factor of 2^ for any £ > is NP-hard, where d is the 
number of variables (see Roth [T5^ for a proof). 

Theorem 1. MINCON is ifP-hard and approximating it by a factor of 2^^ 
is 'NP-hard for any e > 0. 

Proof To show the theorem, we reduce #MON-CNF to MINCON. Let 
□ (ai, . . . ,a(i) denote a box [0,ai] x ... x [0,a^]. Let / = Afe=i ViGC^ 
a monotone Boolean formula given in CNF with Ck ^ [d] := {l,...,d}, for 
/c G [n], d the number of variables, n the number of clauses. First, we construct 
a box Ak = , . . . , a^, 2^ + 2) C R^+^ for each clause Ck with one vertex at 
the origin and the opposite vertex at (a^^, . . . , a^, 2^ + 2), where we set 

if ieCk . 

, I e [d]. 

otherwise 

Additionally, we need a box 5 = 0(2, ...,2,1) C R^+i and set M = 
{Ai, . . . ^ An^ B}. Since we can assume without loss of generality that no clause is 
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dominated by another, meaning Ci ^ Cj for every i 7^ j, every box Ak uniquely 
overlaps a region [xi, xi + 1] x ... x [xd, Xd -\- I] x [1, 2^ + 2] with Xi G {0, 1}, 
i G [d], so that the contribution of every box Ak is greater than 2^ and the 
contribution of B is at most 2^, so that B is indeed the least contributor. 

Observe that the contribution of B to HYP(M) can be written as a union 
of boxes of the form Bxi,...,xd = [^i^^i + 1] x • • • x [xd^Xd + 1] x [0, 1] with 
Xi e {0,1}, i G [d]. Moreover, B^^^^^^^^^ is not a subset of the contribution of 
B to HYP(M) iff it is a subset of IJ^^i iff it is a subset of some Ak iff we 
have > + 1 for i G [d] iff = 2 for all i with = 1 iff i ^ for 
all i with = 1 iff (xi, . . . , Xd) satisfies f\i^Ck ""^^ some k iff (xi, . . . , Xd) 
satisfies the negated formula / = Vfc=i AieCk "^^^^ implies that Bx-^^...^xd 
is a subset of the contribution of B iff [xi^ . . . ^Xd) satisfies /. Hence, since 
vol(B^,,...,^J = 1, we have MINCON(M) = CON(M,5) = |{(xi, . . . , x^) G 
{0, 1}^ I (xi, . . . satisfies /}|. Thus a polynomial time algorithm solving 
MINCON(M) would result in a polynomial time algorithm for #MON-CNF, 
which proves the claim. □ 

Note that the reduction from above implies that MINCON is #P-hard 
and NP-hard to approximate even if the least contributor is known. Moreover, 
since we constructed boxes with integer coordinates in [0, 2^ + 2] a number of 
b = 0{d^n) bits suffices to represent all coordinates of the n + 1 constructed 
points. Hence, MINCON is hard even if all coordinates are integral. We define 
as input size & + n + where h is the number of bits in the input. We will use 
this result in the next proof. Also note that the same hardness for CON follows 
immediately, as it is hard to compute CON(M, 5) as constructed above. 

By reducing MINCON to LC, one can now show NP-hardness of LC. We 
skip this proof and directly prove NP-hardness of e-LC by using the hardness 
of approximating MINCON in the following theorem. 

Theorem 2. e-LC is 'N'P-hard for any constant e. More precisely, it is 
NV-hard for {I -\- e) bounded from above by 2^ """-^ for some c > 0. 

Proof We reduce MINCON to s-LC. Let M be a set of n boxes in M^, i.e., a 
problem instance of MINCON represented by a number of b bits, so that the 
input size is b -\- n -\- d. 

As discussed above, we can assume that the coordinates are integral. We 
can further assume that d > 2 as MINCON is trivial for d = 1. The minimal 
contribution of M might be 0, but this occurs if and only if one box in M 
dominates another. As the latter can be checked in polynomial time, we can 
without loss of generality also assume that MINCON(M) > 0. 

Now, let V be the volume of the bounding box of all the boxes in M, i.e., 
the product of all maximal coordinates in the d dimensions. We know that V 
is an integer with 1 < y < 2^, as there are only b bits in the input. 
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We now define a slightly modified set of boxes: 

A = {n{ai+2V,a2,...,ad)\a{ai,...,ad) eM}, 

B = n(2V,...,2V), 

Cx = □(!,..., 1,21/ + A), 

Mx = Au{B}U{Cx}. 

The boxes in A are the boxes of M, but shifted along the xi-axis. By 
definition, ai ^ V, i e [d] for all □(ai, . . . , a^^) G M. The contribution to 
HYP (Ma) of a box in A is the same as the contribution to HYP(M) of the 
corresponding box in M as the additional part is overlapped by the "blocking" 
box B. Also note that the contribution of a box in A is less or equal than V. 

The box B uniquely overlaps at least the space [V, 2^] x ... x [V, 2V] (as 
every coordinate of a point in M is less than equal to V) which has volume at 
least V. Hence, B is never the least contributor of Ma. The box Cx then has a 
contribution of VOl([0, 1] x . . . x [0, 1] x [2V, 2^ + A]) = A, so that Cx is a least 
contributor iff A is less than or equal to the minimal contribution of any box in 
A to HYP(Ma) which holds iff we have A < MINCON(M). 

Since we can decide, whether Cx is the least contributor, by one call to 
LC(Ma), we can do kind of a binary search on A. As we are interested in a 
multiplicative approximation, we search for k := log2(A) to be the largest value 
less than equal to log2(MINCON(M)), where k now is an integer in the range 
[0,b]. 

As we can only answer e-LC-queries we cannot do exact binary search. But 
we can still follow its lines, recurring on the left half of the current interval, if 
for the median value /^m we get e-LC{Mx^) = C'a^, where Am = 2^^"^, and on 
the right half, if we get any other result. 

The incorrectness of s-LC may misguide our search, but since we have 

CON(M,£:-LC(M)) <(! + £:) MINCON(M) 

it can give a wrong answer (i.e., not the least contributor) only if we have 
(1 + £)-iMINCON(M) < 2^ < (1 + £) MINCON(M). Outside of this interval 
our search goes perfectly well. Thus, after the binary search, i.e, after at most 
[log2(&)] many calls to e-LC, we end up at a value k, which is either inside 
the above interval (in which case we are satisfied) or the largest integer smaller 
than log2((l+5)"^MINCON(M)) or the smallest integer greater than log2((l + 
£:)MINCON(M)). Hence, we have 

1^ < log2((l + e) MINCON(M)) + 1 

implying 

A = 2'^ < 2(1 + s) MINCON(M). 
Analogously, we get 
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Therefore after 0{\og{b)) many calls to e-LC we get a 2 (1 + e) approximation 

of MINCON(M). Since this is NP-hard for 2 (1 + 5) bounded from above by 
11—C 

2^ for some c > 0, we showed NP-hardness of s-LC in this case. Note that 
this includes any constant s. □ 

The NP-hardness of e-LC not only implies NP-hardness of LC, but also 
the non-existence of an efficient algorithm for e-S-LC unless NP = BPP. The 
above proof also gives a very good intuition about the problem e-LC: As we can 
approximate the minimal contribution by a small number of calls to e-LC, there 
cannot be a much faster way to solve s-LC but to approximate the contributions 
- approximating at least the least contribution can be only a factor of 0(log(6)) 
slower than solving s-LC. This motivates the algorithm we present in the next 
section, which tries to approximate the contributions of the various boxes. 

3 Practical approximation algorithm 

The last section ruled out the possibility of a worst case efficient algorithm for 
computing or approximating the least contributor. Nevertheless, we are now 
presenting an algorithm A that is "safe" and has a good practical runtime, but 
no polynomial worst case runtime (as this is not possible). By "safe" we mean 
that it provably solves e-5-LC, i.e., it holds that 

Pr[CON(M, A{M, s, S)) < (1 + e) MINCON(M)] > 1 - ^. 

As the algorithm is going to approximate the contributions, we cannot avoid 
s and solve LC directly, as with no A) -approximation, for any A > we can 
decide whether two contributions are equal or just nearly equal (and in the latter 
case which one is greater). We consider an s around 10~^ or 10~^ as sufficient 
for typical instances. This implies for most instances that we return the correct 
result as there are no two small contributions which are only a (1 -h e) -factor 
apart. For the remaining cases we return at least a box which has contribution 
at most (1+^) times the minimal contribution, which means we make an "error" 
of s. 

Additionally, the algorithm is going to be a randomized Monte Carlo algo- 
rithm, which is why we need the 6 and do not always return the correct result. 
However, we will be able to set S = 10~^ or even S = 10~^^ without increasing 
the runtime overly. In the following we will describe algorithm A, prove its 
correctness and describe its runtime. 

3.1 The algorithm A 

Our algorithm works as follows. For each box A it determines the minimal 
bounding box of the space that is uniquely overlapped by the box. To do so 
we start with the box A itself. Then we iterate over all other boxes B. B 
dominates A in all but one dimension, then we can cut the bounding box in the 
non-dominated dimension. This can be realized in 0{dn^). 
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Having the bounding box BB^ of the contribution of A we start to sample 
randomly in it. For each random point we determine if it is uniquely dominated 
by A. If we checked noSamples(A) random points and noSuccSamples(A) 
of them were uniquely dominated by A, then the contribution of A is about 

Va ■■= ""moSamZsTT^ ^Q^^^^-^)' ^^^^^ VOL(BB^) denotes the volume of 
the bounding box of the contribution of A. Additionally, we can give an es- 
timate of the deviation of Va from Va, the correct contribution of A (i.e., 
Va = CON(M, A)): Using Chernoff's inequality we get that for 



the probability that Va deviates from Va by more than A{A) is small enough. 
_ We would like to sample in the bounding boxes in parallel such that every 
Va deviates about the same A. We do this by initializing A arbitrarily (e.g., 
A = 1) and then in every iteration decrease A by some factor (e.g., ^) and 
sample in each bounding box until we have A(74) < A. If we then have at any 
point two boxes A and B with 

Va - A{A) >Vb^ A{B) (2) 

we can with good probability assume that A is not a least contributor as we 
would need to have Va — Va > A{A) or Vb — Vb > A{B) for A having a 
less contribution than B (which is necessary for A being the least contributor). 
Hence, whenever such a situation occurs we can delete A from our race, meaning 
that we do not have to sample in its bounding box anymore. Note that we never 
havejto^compare two arbitrary boxes, but only a box A to the currently smallest 
box LC, i.e., the box with V~ minimal. 

We can run this race, deleting boxes if their contribution is clearly too much 
by the above selection equation until either there is just one box left, in which 
case we have found the least contributor, or until we have reached a point 
where we have approximated^!], contributions well enough. Given an abortion 
criterion e we can just return LC (the box with currently smallest approximated 
contribution) when we have 

Vj^ + A{LC) 

< ^ < 1 + £, 

Va - A{A) 

for any box A ^ LC still in the race. If this equation holds, then we can be 
quite sure that any box has contribution at least j^Vj^^ and, similarly, all 
other boxes that are still in the race, too. So, after all, we have solved e-5-LC. 



3.2 Runtime 

As discussed above, our algorithm needs a runtime of at least Vt{din?). This 
seems to be the true runtime on many practical instances (cf. Section [4|. How- 
ever, by Theorem |2] we cannot hope for a matching upper bound. In this section 
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Algorithm 1 A{M,e,S) solves e-S-LC{M) for a set M of n boxes in R"^ 
and e^S > 0, i.e., it determines a box x G M s.t. Pr[CON(M, x) < 
(1 + e) MINCON(M)] >1-S. 

determine the bounding boxes BB^ for all A G M 

initialize noSamples(A) = noSuccSamples(A) = for all A G M 

initialize A 

set S := M 

repeat 

set A := A/2 
for allAeS do 
repeat 

sample a random point in BB^ 

increase noSamples(A) and possibly noSuccSamples(A) 
update Va and A{A) 
until A{A) < A 
od_ 

set LC := argmin{l/A \ A e S} 
for aUAeS do _ _ 

if Va - A{A) >Vj^^ A{LC) then 
S := S\{A} 

od 

od ^ _ _ 

until 151 = 1 or < ^^^tl^^^ < 1 + £ for any LC 7^ A G S' 

return LC 



we present an upper bound on the runtime depending on some characteristics 
of the input. 

For an upper bound, observe that we have to approximate each box A up to 
A = {Va - MINC0N(M))/4 to be able to delete it with high probability: At 
this point, Va > Va — A and Vb < + A, for 5 a least contributor, so that 
Va — Vb > 2 A with probability at least 1 — 6/n. Similarly, we can show that 
the expected value of A where we delete box A is n{VA - MINCON(M)). By 
equation ([T]) we observe that we need a number of 

log(2n/J)vOL(BBA)^ ^ f \og{n/5)vOL{BBA)^ \ 
2n{VA - MINCON(M))2 V i^A - MINCON(M))2 J 

samples to delete box A on average. For the least contributor LC, we need 

(V')-MINCON(M))2 

) many samples until we have finally deleted all 

other boxes, where sec-min(V) denotes the second smallest contribution of any 
box in M. Since each sample takes runtime 0{dn) and everything besides the 
sampling takes much less runtime, we get an overall runtime of 



0{dn{n^\og{n/S) H)), 
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where 



voL(BB^c)^ ^ VOL(BB^) 



(sec-min(V) - MINCON(M))2 ^ {Va - MINCON(M))2 

is a certain measure of hardness of the input. This value is unbounded and 
can even be undefined if there is no unique least contributor. In this case our 
abortion criterion comes into play: With probability (1 — ^) after approximating 
every contribution up to A = ^^MINCON(M) we have Vlc < V^c + A, thus 

Vjj^ < Vlc + A, and Va > Vlc — A for every other box A still in the race. 
Then we conclude 

VA-AiA) -VLC-2A l-2j^ 

for every box LC ^ A ^ S. Hence, the above defined value for A suffices to en- 
force abortion. Since we get this A after noSamples(A) = ^iog(/^/gvg^(B^ 

samples, this yields another upper bound for the overall number of samples, a 
still unbounded but always finite value: 



AeM 

However, for the random testcases that we consider in Section [4] the above 
defined hardness H is a more realistic measure of runtime as there are never two 
identical contributions and not too many equally small contributions. There 
one observes values for H that roughly lie in the interval [n, lOn]. 

3.3 Correctness of our algorithm 

To prove the correctness of our algorithm, we first show the following lemmas. 

Lemma 3. For Va = ^^^I^Si^sS^vOL(BB^) it holds that Ex[Va] = Va. 

Proof. We can see noSuccSamples(74) as a sum of independent, identically dis- 
tributed random variables Xi, . . . , XnoSamples(A) with = 1, if the i-th sample 
point lied inside the space uniquely dominated by A, and Xi = 0, otherwise. 
Observe that Pr[Xi = 1] = Va/vol(BBa). Then we have 

Va 

Ex[noSuccSamples(A)] = noSamples(A) 

VOL(BBy^) 

and by linearity of expectation we get 

'noSuccSamples(A) 



Ex[Va] = Ex 



noSamples(A) 



vol(BBa) 



= Va. □ 
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Furthermore, we can bound the probabihty of a great deviation of Va from 
its mean as fohows. 

Lemma 4. For any box A we have 

Pt[Va-Va> A{A)]<^ and Ft[Va - Va < -A{A)] < ^. 
Proof. We can bound the probabihties as fohows. 
Pr [Va -Va> A{A) 

= Pr 



noSuccSamples(A) , / log{2n/S) ^ 

^ ^vol{BBa) -Va> \ - — — ^ vol(BBa) 



noSamples(A) y 2noSamples(A) 



= Pr 



Pr 



noSuccSamples(A) - ^vo]^( B b"^)^ ~ y ^ log(2n/(5)NoSAMPLEs(A) 



Xi + . . . + XnoSamples(A) -I^>\I^ log(2n/J)NOSAMPLES(A) 



where we write noSuccSamples(74) as a sum of random variables Xi as in 
the proof of Lemma [S] and /i denotes the expected value of the sum of the Xi. 
Chernoff 's inequality then implies 



/ 2 f ./^ log(2n/J)NoSAMPLEs(A)) \ r 

PAVa -Va> A{A)] < exp - ^ ^ = ^. 

^ V yj - F I noSamples(A) J 2n 

Analogously, we get the same bound for Pr[VA — Va < —A{A)]. □ 

From this we can easily derive correctness of the selection: 

Corollary 5. If, at any point, selection rule ([2| applies to boxes A and B, i.e., 
Va - A{A) >Vb^ A{B), then Pt[Va < ^b] < 

Proof To have Va < Vb we need to have Va<Va- A{A) or Vb > Vb ^ A{B) 
(or both). Hence, we can bound the probability of an error using the union 
bound by 

Pt[Va < Vb] < Pt[Va -Va> A{A)] + Ft[Vb -Vb < -A{B)] < ^, 

where the latter inequality follows from Lemma [4j □ 

Since we delete a number of m < n boxes during the algorithm, we need to 
bound the probability of an error during the deletions, where we mean by error 
that the selection rule applies to boxes A and B and we delete B but actually 
Va ^Vb- Basic usage of the union bound leads to the following corollary. 
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Corollary 6. The probability that an error occurs in any of m < n deletions is 
bounded from above by ^S. 

Now, if we end with IS*! > 1, we need to bound the probabihty, that there is 
an error on abortion, meaning that there is some box A e S with Va < j^'^Zc' 



Lemma 7. Aborting with a set S C M, the probability of an error on abortion 

M 

2rj 



is bounded from above by 



Pr[3A eS:VA< ^^lcI < 



1 1 ^ 1^ 

2n * 



Proof. On abortion we know that < < 1 + £ for any LC ^ A ^ S. 

The probabihty of any of the events Vjj^ > V~ + A{LC) and Va <Va- A (A) 
for any LC ^ A e S occurring is bounded by the union bound and Lemma [i] 
by from above. 

If none of those events occurs, then we have for any LC A e S: 

V^^V^^A{LQ 
Va ~ Va-A{A) ' 

as the latter is the maximal value of the first conditioned on < -^^^t-^tt^t^ • 

Va-A(A) 

But the latter is less equal 1 + e for any LC 7^ A G 5*, so that in this case we 
have for ah Ae S: Va> ^Vj^. □ 

This leads to the final result. 

Lemma 8 (Correctness of .4). The probability of A{M,e,S) being a correct 
result of e-LC is at least (1 — 5), i.e., 

Pr[CON(M, A{M, e, 5)) < (1 + MINCON(M)] >l-5. 

Proof. If no error occurs during the deletion phase, then the least contributors 
are still in S after the main loop, as they cannot have been deleted. Then LC 
has contribution at most (1 + 5) MINCON(M) unless there was an error on 
abortion. Thus, we can bound the probability of LC not being a correct result 
by the two aforementioned types of errors. Letting m denote the number of 
deletions, so that after the main loop we have l^*! = n — m we can bound this 
probability using Corollary [6] and Lemma [7| and the union bound another time 
to get 

, ^ l^*! 

PilLC not a correct result] < — S -\ S < S 

n 2n 

Thus, we get the desired bound for the probability of returning a correct result. 

□ 
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3.4 Heuristical improvements 

To increase the practical efficiency of our algorithm, we implemented a few 
further optimizations that decrease the actual runtime. In this section we will 
describe three implemented heuristics. 

3.4.1 Push on A(LC): 

Since we compare all boxes still in the racejdth the currently minimal one LC, 
it is intuitively a good idea to decrease A{LC) faster than all other A (A), i.e., 
if we have a current bound of A{A) < A for any A e S we should sample in LC 
until we have A{LC) < a A for some constant a < 1. This improves the runtime 
bjMip to a factor of 4: If we needed some value of A to distinguish boxes A and 
LC before, we now only need A' = 3^ A for A and A'{LC) = ^q^A. As the 
number of samples needed is proportional to A~^ it changes by a good factor 
of ^ \ for n — 1 boxes and a worse factor of for the one box LC . On 
practical instances a = 0.2 seemed to be a reasonable value. 

3.4.2 Sampling heuristic: 

Is is clear how to find a random point X inside a bounding box BB^. Then we 
have to check whether X lies in a box ^4 7^ 5 G M. If no 5 dominates X, then X 
is counted as a successful sample, otherwise not. Now it suffices test X to lie only 
in a subset of M. Only points with all coordinates bigger than the lower vertex 
of the bounding box BB^ can possibly dominate X. By determining these once 
at the beginning and saving them we get a space requirement of 0{v?^ but 
an improvement of the runtime of between one and two orders of magnitude. 
Furthermore, we can decide to rearrange these points such that we check whether 
X lies in all possible dominating boxes B in descending order of the volume of 
the part of BB^ that is dominated by B. This way we intuitively speed up all 
"unsuccessful" searches, i.e., all samples where X is indeed dominated. On real 
instances this yields a speedup of small constant factor. 

3.4.3 Exact calculation: 

As an involved sampling algorithm only makes sense for large instances, our 
implementation uses a classical exact algorithm for small n and d. The difficulty 
is to decide when to do so. Our approach works as follows. After we determined 
the boxes that dominate the lower vertex of a bounding box BB^, i.e., the boxes 
that "influence" the contribution of A, some of those sets of influencing boxes 
have quite a small cardinality ua- Especially boxes with small contribution tend 
to have only a small number of influencing boxes. Hence if this number is small 
we can determine the contribution exactly by using some classical hypervolume 
algorithm. In this case, we just restrict the ua influencing boxes to the bounding 
box BB^ and solve the induced HYP problem (inside BB^) to get a volume v. 
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After that, we subtract v from vol(BB^) which gives us the correct contribution 
of A 

This can be done for any box A with a smah number of influencing boxes 
After calculating its contribution exactly we just have to set A{A) := 
which works fine in our algorithm. The only problem is to decide which values 
of ua are to be considered "small" in this respect. For each box A we count 
how many elementary operations we made so far for sampling in its bound- 
ing box (by counting how many coordinate comparisons we made), calling this 
number noOps(74). We also try to estimate the runtime (number of elemetary 
operations) we would need for computing the contribution of A exactly. This, 
of course, depends on the algorithm one uses. We use the well-known HSO 
algorithm by Zitzler [20] and the algorithm by Beume and Rudolph [2 which 
we will call BR, but one can, of course, use an arbitrary exact hypervolume 
algorithm. For those algorithms we can bound the runtime by 0{n{^~^^~'^)) for 
HSO [18] and 0(nlogn + n^^'^) for BR By approximating the hidden con- 
stant hidden in the asymptotic notation, we get an upper bound of the number 
of operations the two algorithms make. Having this, we can at any point in 
the algorithm decide to compute a contribution exactly rather than continue 
to sample in it. That is, if noOps(A) > estimatedRuntimeHSO(nA, <i) we just 
compute it exactly. This way we need only twice as much time as if we had 
computed the contribution exactly right from the start. Also, if we needed only 
a small number of samples more to throw A out of the race, we only needed 
twice as much time overall computing the contribution exactly than continuing 
to sample. Hence, by this decision we always need at most twice the number of 
operations we would have needed with the optimal decision. This also implies 
that asymptotically our runtime with this heuristic is upper bounded by the 
minimum of HSO and BR. 

Note that this improvement changes nothing for high dimensions (say, d > 
20) as both exact algorithms quickly become unusable for these cases. The 
observed power of our algorithm for high dimensions (like d = 100) comes from 
the sampling, not from the combination with the exact algorithms. 

4 Experimental analysis 

To demonstrate the performance of the described approximation algorithm for 
the hypervolume contribution, we have implemented it and measured its perfor- 
mance on different datasets. We now first describe the used benchmark datasets 
and then our results. 

4.1 Datasets 

We used five different fronts similar to the DTLZ test suite [6^. As we do not 
want to compare the hypervolume algorithms for point distributions specific to 
different optimizers like NSGA-II [5 or SPEA2 [2^, we have sampled the points 
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(a) Spherical dataset. (b) Linear dataset. (c) Concave dataset. 

Figure 1: Visualization of the first three datasets. 
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(a) Spherical dataset. (b) Linear dataset. (c) Concave dataset. 

Figure 2: Experimental results for d = 3. 
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(a) Spherical dataset. (b) Linear dataset. (c) Concave dataset. 

Figure 3: Experimental results for = 10. 
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(a) Spherical dataset. (b) Linear dataset. (c) Concave dataset. 

Figure 4: Experimental results for d = 100. 
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from different surfaces randomly. This allows full scalability of the datasets in 
the number of points and the number of dimensions. 

To define the datasets, we use random variables with two different distri- 
butions. Simple uniformly distributed random variables are provided by the 
build-in random number generator rand() of C++. To get random variables 
with a Gaussian distribution, we used the polar form of the Box-Muller trans- 
formation as described in [T4| . 



4.1.1 Linear dataset: 

The first dataset consists of points 
They are obtained by generating d ( 
then using the normalized points 

(xi,X2,...,Xn) : 



(xi, X2, . . . , Xrf) e [0, 1]"^ with x;i=i = 1. 

Gaussian random variables yi, ^2, Vd and 

^ {\yi\Ay2\^'"^\yn\) 
|yi| + |y2| + ... + |ydr 



4.1.2 Spherical dataset: 

To obtain uniformly distributed points (xi, X2, . . . , Xd) G [0, 1]^ with Ylt=i — 
1 we follow the method of Muller [TO . That is, we generate d Gaussian random 
variables ^i, ^2, Vd and take the points 



(|^lUy2|,...,|^n|) 



(xi , X2 , . . . , 3^n) • / — 9 — ; 2 2* 



4.1.3 Concave dataset: 

Analogously to the spherical dataset we choose points (xi,X2, . . . ^Xd) G [0, 1]^ 
with Y^^^i yjxi — 1. For this, we generate again d Gaussian random variables 
V\i ^2, Vd and use the points 

/ N (|^lU^2|,...,|^n|) 

(Xi,X2, ...,Xn)-- 



For d = 3, the surface of the dataset is shown in Figure [T] Additionally to 
random points lying on a lower-dimensional surface, we have also examined the 
following two datasets with points sampled from the actual space similar to the 
random dataset examined by While et al. [18]. 



4.1.4 Random dataset 1: 

We first draw n uniformly distributed points from [0,1]^ and then replace all 
dominated points by new random points until we have a set of n nondominated 
points. 
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(a) Random dataset 1. (b) Random dataset 2. 

Figure 5: Experimental results for random datasets with d = 5. 
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(a) Random dataset 1. (b) Random dataset 2. 

Figure 6: Experimental results for random datasets with d = 100. 
4.1.5 Random dataset 2: 

Very similar to the previous dataset, we choose random points until there are 
no dominated points. The only difference is that this time the points are not 
drawn uniformly, but Gaussian distributed in with mean 1. 

Note that the last two datasets are far from being uniformly distributed. The 
points of the first set all have at least one coordinate very close to 1 while the 
points of the second set all have at least one coordinate which is significantly 
above the mean value. This make their computation for many points (e.g., 
n ^ 100) in small dimension (e.g., d ^ 5) computationally very expensive as it 
becomes more and more unlikely to sample a nondominated point. 

4.2 Comparison 

We have implemented our algorithm in C++ and compared it with the available 
implementations of HSO by Eckart Zitzler [20 and BR by Nicola Beume [2 . 
We did not add any further heuristics to both exact algorithms as all published 
heuristics do not improve the asymptotic runtime and even a speedup of a few 
magnitudes does not change the picture significantly. 

It would be better to compare our approximation algorithm with other ap- 
proximation algorithms instead of exact algorithms. However, the only other 
published approximation algorithm seems to be [Ij, which is not publicly avail- 
able yet. Another reason is that all available optimization algorithms based 
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on the hypervolume indicator use exact calculations and hence our speedup is 
carried over to them. 

All experiments were run on a cluster of 100 machines with two 2.4 GHz 
AMD Opteron processors, operating in 32-bit mode, running Linux. For our 
approximation algorithm we used the parameters S = 10~^ and e = 10~^. The 
code used is available upon request and will be distributed from the homepage 
of the second author. 

Figure [2]|6] show double- logarithmic plots of the runtime for different datasets 
and number of dimensions. The shown values are the median of 100 runs 
each. To illustrate the occurring deviations below and above the median, we 
also plotted all measured runtimes as ligther single points in the back. As 
both axes are scaled logarithmically, also the examined problem sizes are dis- 
tributed logarithmically. That is, we only calculated Pareto sets of size n if 
n e {[exp(/c/100)J | A: G N}. We examined dimensions d = 3,10,100 for the 
first three datasets and d = 5, 100 for the last two datasets. 

Independent of the number of solutions and dimension, we always observed 
that, unless n ^ 10, our algorithm outperformed HSO and BR substantially. On 
the used machines this means that only if the calculation time was insignificant 
(say, below 10~^ seconds), the exact algorithm could compete. On the other 
hand, the much lower median of our algorithm also comes with a much higher 
empirical standard deviation and interquartile range. In fact, we observed that 
the upper quartile can be up to five times slower than the median (for the 
especially degenerated random dataset 1). The highest ratio observed between 
the maximum runtime and the average runtime is 66 (again for the random 
dataset 1). This behavior is represented in the plots by the spread of lighter 
datapoints in the back of the median. However, there are not too many outliers 
and even their runtime outperforms HSO and BR. The non-monotonicity of 
our algorithm around n = 10 for d = 10 is caused by the approximate for the 
runtimes of the exact algorithms. 

For larger dimensions the advantage of our approximation algorithm becomes 
tremendous. For d = 100 we observed that within 100 seconds our algorithm 
could solve all problems with less than 6000 solutions while HSO an BR could 
not solve any problem for a population of 6 solutions in the same time. For 
example for 7 solutions on the 100-dimensional linear front, HSO needed 13 
minutes, BR 7 hours while our algorithm terminated within 0.5 milliseconds. 

5 Conclusions 

We have proven that most natural questions about the hypervolume contri- 
bution which are relevant for evolutionary multi-objective optimizers are not 
only computationally hard to decide, but also hard to approximate. On the 
other hand, we have presented a new approximation algorithm which works 
extremely fast for all tested practical instances. It can solve efficiently large 
high-dimensional instances {d ^ 10, n ^ 100) which are intractable for all pre- 
vious exact algorithms and heuristics. 
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It would be very interesting to compare the algorithms on further datasets. 
We believe that only when two solutions have contributions of very close value, 
our algorithm slows down. For practical instances this should not matter as it 
simply occurs too rarely - but this conjecture should be substantiated by some 
broader experimental study in the future. 
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